31 research outputs found
Concept-modulated model-based offline reinforcement learning for rapid generalization
The robustness of any machine learning solution is fundamentally bound by the
data it was trained on. One way to generalize beyond the original training is
through human-informed augmentation of the original dataset; however, it is
impossible to specify all possible failure cases that can occur during
deployment. To address this limitation we combine model-based reinforcement
learning and model-interpretability methods to propose a solution that
self-generates simulated scenarios constrained by environmental concepts and
dynamics learned in an unsupervised manner. In particular, an internal model of
the agent's environment is conditioned on low-dimensional concept
representations of the input space that are sensitive to the agent's actions.
We demonstrate this method within a standard realistic driving simulator in a
simple point-to-point navigation task, where we show dramatic improvements in
one-shot generalization to different instances of specified failure cases as
well as zero-shot generalization to similar variations compared to model-based
and model-free approaches
Context Meta-Reinforcement Learning via Neuromodulation
Meta-reinforcement learning (meta-RL) algorithms enable agents to adapt
quickly to tasks from few samples in dynamic environments. Such a feat is
achieved through dynamic representations in an agent's policy network (obtained
via reasoning about task context, model parameter updates, or both). However,
obtaining rich dynamic representations for fast adaptation beyond simple
benchmark problems is challenging due to the burden placed on the policy
network to accommodate different policies. This paper addresses the challenge
by introducing neuromodulation as a modular component to augment a standard
policy network that regulates neuronal activities in order to produce efficient
dynamic representations for task adaptation. The proposed extension to the
policy network is evaluated across multiple discrete and continuous control
environments of increasing complexity. To prove the generality and benefits of
the extension in meta-RL, the neuromodulated network was applied to two
state-of-the-art meta-RL algorithms (CAVIA and PEARL). The result demonstrates
that meta-RL augmented with neuromodulation produces significantly better
result and richer dynamic representations in comparison to the baselines
Sliced Cramer synaptic consolidation for preserving deeply learned representations
Deep neural networks suffer from the inability to preserve the learned data representation (i.e., catastrophic forgetting) in domains where the input data distribution is non-stationary, and it changes during training. Various selective synaptic
plasticity approaches have been recently proposed to preserve network parameters, which are crucial for previously learned tasks while learning new tasks.
We explore such selective synaptic plasticity approaches through a unifying lens
of memory replay and show the close relationship between methods like Elastic
Weight Consolidation (EWC) and Memory-Aware-Synapses (MAS). We then propose a fundamentally different class of preservation methods that aim at preserving the distribution of the network’s output at an arbitrary layer for previous tasks
while learning a new one. We propose the sliced Cramer distance as a suitable ´
choice for such preservation and evaluate our Sliced Cramer Preservation (SCP) ´
algorithm through extensive empirical investigations on various network architectures in both supervised and unsupervised learning settings. We show that SCP
consistently utilizes the learning capacity of the network better than online-EWC
and MAS methods on various incremental learning tasks
A-EMS: An Adaptive Emergency Management System for Autonomous Agents in Unforeseen Situations
International audienceReinforcement learning agents are unable to respond effectively when faced with novel, out-of-distribution events until they have undergone a significant period of additional training. For lifelong learning agents, which cannot be simply taken offline during this period, suboptimal actions may be taken that can result in unacceptable outcomes. This paper presents the Autonomous Emergency Management System (A-EMS)-an online, data-driven, emergency-response method that aims to provide autonomous agents the ability to react to unexpected situations that are very different from those it has been trained or designed to address. The proposed approach devises a customized response to the unforeseen situation sequentially, by selecting actions that minimize the rate of increase of the reconstruction error from a variational autoencoder. This optimization is achieved online in a data-efficient manner (on the order of 30 to 80 data-points) using a modified Bayesian optimization procedure. The potential of A-EMS is demonstrated through emergency situations devised in a simulated 3D car-driving application
Deep Reinforcement Learning with Modulated Hebbian plus Q Network Architecture
This paper presents a new neural architecture that combines a modulated
Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network
architecture (MOHQA). The hypothesis is that such a combination allows MOHQA to
solve difficult partially observable Markov decision process (POMDP) problems
which impair temporal difference (TD)-based RL algorithms such as DQN, as the
TD error cannot be easily derived from observations. The key idea is to use a
Hebbian network with bio-inspired neural traces in order to bridge temporal
delays between actions and rewards when confounding observations and sparse
rewards result in inaccurate TD errors. In MOHQA, DQN learns low level features
and control, while the MOHN contributes to the high-level decisions by
associating rewards with past states and actions. Thus the proposed
architecture combines two modules with significantly different learning
algorithms, a Hebbian associative network and a classical DQN pipeline,
exploiting the advantages of both. Simulations on a set of POMDPs and on the
MALMO environment show that the proposed algorithm improved DQN's results and
even outperformed control tests with A2C, QRDQN+LSTM and REINFORCE algorithms
on some POMDPs with confounding stimuli and sparse rewards
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Despite the advancement of machine learning techniques in recent years,
state-of-the-art systems lack robustness to "real world" events, where the
input distributions and tasks encountered by the deployed systems will not be
limited to the original training context, and systems will instead need to
adapt to novel distributions and tasks while deployed. This critical gap may be
addressed through the development of "Lifelong Learning" systems that are
capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3)
Scalability. Unfortunately, efforts to improve these capabilities are typically
treated as distinct areas of research that are assessed independently, without
regard to the impact of each separate capability on other aspects of the
system. We instead propose a holistic approach, using a suite of metrics and an
evaluation framework to assess Lifelong Learning in a principled way that is
agnostic to specific domains or system techniques. Through five case studies,
we show that this suite of metrics can inform the development of varied and
complex Lifelong Learning systems. We highlight how the proposed suite of
metrics quantifies performance trade-offs present during Lifelong Learning
system development - both the widely discussed Stability-Plasticity dilemma and
the newly proposed relationship between Sample Efficient and Robust Learning.
Further, we make recommendations for the formulation and use of metrics to
guide the continuing development of Lifelong Learning systems and assess their
progress in the future.Comment: To appear in Neural Network
Dose-Dependent Effects of Closed-Loop tACS Delivered During Slow-Wave Oscillations on Memory Consolidation
Sleep is critically important to consolidate information learned throughout the day. Slow-wave sleep (SWS) serves to consolidate declarative memories, a process previously modulated with open-loop non-invasive electrical stimulation, though not always effectively. These failures to replicate could be explained by the fact that stimulation has only been performed in open-loop, as opposed to closed-loop where phase and frequency of the endogenous slow-wave oscillations (SWOs) are matched for optimal timing. The current study investigated the effects of closed-loop transcranial Alternating Current Stimulation (tACS) targeting SWOs during sleep on memory consolidation. 21 participants took part in a three-night, counterbalanced, randomized, single-blind, within-subjects study, investigating performance changes (correct rate and F1 score) on images in a target detection task over 24 h. During sleep, 1.5 mA closed-loop tACS was delivered in phase over electrodes at F3 and F4 and 180° out of phase over electrodes at bilateral mastoids at the frequency (range 0.5–1.2 Hz) and phase of ongoing SWOs for a duration of 5 cycles in each discrete event throughout the night. Data were analyzed in a repeated measures ANOVA framework, and results show that verum stimulation improved post-sleep performance specifically on generalized versions of images used in training at both morning and afternoon tests compared to sham, suggesting the facilitation of schematization of information, but not of rote, veridical recall. We also found a surprising inverted U-shaped dose effect of sleep tACS, which is interpreted in terms of tACS-induced faciliatory and subsequent refractory dynamics of SWO power in scalp EEG. This is the first study showing a selective modulation of long-term memory generalization using a novel closed-loop tACS approach, which holds great potential for both healthy and neuropsychiatric populations
Recommended from our members
Functional Role of Neural Oscillations in Attentional Inhibition and Long Term Memory Retrieval
How does the brain selectively retrieve information from long term memory? What neural mechanisms are critical for this process, and how are these mechanisms brought into service in a task dependent way? What are the implications for the representations that are processed through these mechanisms, and can we use our understanding of them to better utilize encoding and retrieval of information in long term memory? These are some of the fundamental questions being addressed in this dissertation. Through the use of neural network models of the hippocampus and surrounding cortex this dissertation proposes a framework for understanding how time frequency signatures measured at the scalp can be used to track long term memory processes, and make quantitative predictions about how information in long term memory is altered by these processes. The fundamental thesis of this dissertation is that neural oscillations in the theta (3-8 Hz), alpha (8-12 Hz), and beta (12-30 Hz) frequency bands can be tied to specific functional mechanisms supporting long term memory, and that these oscillatory signatures can be tracked in human scalp EEG recordings to predict behavioral changes in the retrieval of items from memory. Specifically that oscillatory power in the theta band positively correlates with the how much information the hippocampus is reactivating for a given retrieval event, power in the alpha band positively correlates with how much information is being inhibited from being retrieved, and beta power negatively correlates with how much non-hippocampal dependent information is being retrieved. This thesis is supported by three behavioral experiments, two EEG experiments and two explorations with a computational neural network model of the hippocampus and surrounding cortex